Goto

Collaborating Authors

 multi-task deep reinforcement learning


Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Neural Information Processing Systems

While Deep Reinforcement Learning (DRL) has emerged as a promising approach to many complex tasks, it remains challenging to train a single DRL agent that is capable of undertaking multiple different continuous control tasks. In this paper, we present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control, which enables a single DRL agent to achieve expert-level performance in multiple different tasks by learning from task-specific teachers. In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm designed particularly for the actor-critic architecture to quickly learn a control policy from the experience of task-specific teachers, and then it employs an online learning algorithm to further improve itself by learning from new online transition samples under the guidance of those teachers. We perform a comprehensive empirical study with two commonly-used benchmarks in the MuJoCo continuous control task suite. The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.


Review for NeurIPS paper: Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Neural Information Processing Systems

Additional Feedback: Describing RL as'de facto' approach to complex tasks could be phrased a bit more humble. Many other approaches address'complex tasks' and even if we limit ourselves to continuous control tasks, there is a considerable community working on optimal control which should not be ignored. Similarly there is significant work on multitask RL such that'scant attention has been paid' is in part incorrect. Using the name ''ideal' solution" for independent solution of individual tasks is incorrect as no transfer can happen between the tasks which can improve performance. Given the description mentions that TD3 uses 2 critic networks, it would be helpful to mention their purpose for more consistency.


Review for NeurIPS paper: Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Neural Information Processing Systems

The work proposed a simple multi-task RL approach to continuous control through two-stage training, an offline stage with policy distillation and an online stage to fine-tune the meta-policy with online transitions collected from interacting with actual environment. The paper overall is well written and easy to understand. Reviewers appreciate the extensiveness of the experiments and ablation studies demonstrating the effectiveness of the proposed approach. It is encouraging to see the simple framework achieve significant boost over state-of-the-art multi-task RL approach.


Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Neural Information Processing Systems

While Deep Reinforcement Learning (DRL) has emerged as a promising approach to many complex tasks, it remains challenging to train a single DRL agent that is capable of undertaking multiple different continuous control tasks. In this paper, we present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control, which enables a single DRL agent to achieve expert-level performance in multiple different tasks by learning from task-specific teachers. In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm designed particularly for the actor-critic architecture to quickly learn a control policy from the experience of task-specific teachers, and then it employs an online learning algorithm to further improve itself by learning from new online transition samples under the guidance of those teachers. We perform a comprehensive empirical study with two commonly-used benchmarks in the MuJoCo continuous control task suite. The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.


Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing

He, Jinmin, Li, Kai, Zang, Yifan, Fu, Haobo, Fu, Qiang, Xing, Junliang, Cheng, Jian

arXiv.org Artificial Intelligence

Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. To enhance data efficiency by sharing parameters across multiple tasks, a common practice segments the network into distinct modules and trains a routing network to recombine these modules into task-specific policies. However, existing routing approaches employ a fixed number of modules for all tasks, neglecting that tasks with varying difficulties commonly require varying amounts of knowledge. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. Under this framework, we further introduce a ResRouting method to address the issue of disparate routing paths between behavior and target policies during off-policy training. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones. We conduct extensive experiments on various robotics manipulation tasks in the Meta-World benchmark, where D2R achieves state-of-the-art performance with significantly improved learning efficiency.